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ABSTRACT 


We introduce a stricter Web community definition to over- 
come boundary ambiguity of a Web community defined by 
Flake, Lawrence and Giles [2], and consider the problem of 
finding communities that satisfy our definition. We discuss 
how to find such communities and hardness of this problem. 

We also propose Web page partitioning by equivalence 
relation defined using the class of communities of our defini- 
tion. Though the problem of efficiently finding all communi- 
ties of our definition is NP-complete, we propose an efficient 
method of finding a subclass of communities among the sets 
partitioned by each of n — 1 cuts represented by a Gomory- 
Hu tree [10], and partitioning a Web graph by equivalence 
relation defined using the subclass. 

According to our preliminary experiments, partitioning by 
our method divided the pages retrieved by keyword search 
into several different categories to some extent. 


Categories and Subject Descriptors 


H.2.8 [Database Management]: Database Applications— 
data mining; H.3.3 [Information Systems]: Information 
Storage and Retrieval—clustering, information filtering; G.2.2 
[Discrete Mathematics]: Graph Theory—network prob- 
lems 


General Terms 


Algorithms, Experimentation 


Keywords 


Web community, graph partitioning, maximum flow algo- 
rithm 


1. INTRODUCTION 


The World-Wide Web (WWW) is a huge network in which 
pages are connected by links. Inside this Web link structure, 
a lot of valuable information about the relationship between 
Web pages exists because Web links are created by People 
for the purpose of guidance to the related pages. Actually, 
this information has been already used for many applica- 
tion, for example, the ranking of pages retrieved by a search 
engine [6, 1]. 
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One of the interesting substructures of the WWW net- 
work is a community structure, a structure of subgraphs with 
dense connections. A set of pages having such structure, a 
Web community, is conceivably created by people having 
the same interests. Therefore, discovered communities can 
be used in Web page search and recommendation. 

There are mainly the following two detailed definitions 
of communities defined as dense subgraphs. One definition 
proposed by Kumar et al. [7] is that a community is a dense 
directed bipartite subgraph which contains a complete bipar- 
tite subgraph of a certain size. However, this definition is 
ambiguous because it still contains the word ‘dense’. The 
definition would become clear if a community were defined as 
a complete bipartite subgraph itself, but then most commu- 
nities would be too small and most pages would not belong 
to any communities. The other definition proposed by Flake, 
Lawrence, Giles [2] is that a community is a vertex subset in 
which each member vertex has at least as many edges con- 
necting to member vertices as it does to non-member ver- 
tices. This definition is clear, and for every vertex subset, 
it is possible to decide whether it is a community or not. 
Besides, the possibility that large communities exist seems 
to be high, and most pages seem to belong to some commu- 
nities that are not a whole set of vertices. 

In order to find communities defined by Flake et al., which 
we call FLG-communities here, the following two methods 
were proposed so far, and both methods are trying not 
to find densely-connected vertex subsets but to find those 
sparsely connected to their outside vertices. One is a method 
based on edge betweenness proposed by Girvan and New- 
man [4]. The edge betweenness of an edge is defined as 
the number of shortest paths between pairs of vertices that 
run along it. Based on the idea that the edges connecting 
the inside and the outside of a community are expected to 
have high edge betweenness, Girvan and Newman proposed 
a method of removing an edge with the highest edge be- 
tweenness one by one. However, it is not guaranteed that 
FLG-communities are found by their method’. The other 
is a method using the mazimum flow algorithm proposed by 
Flake et al. [2]. The maximum flow algorithm [9, 10] is an 
algorithm that calculates how much water must run through 
each edge in order to maximize the total amount of water 
running from a vertex (source) to another vertex (sink) on 
condition that the amount of water through each edge must 
be at most its given capacity. The method using the maxi- 


lFor example, in Figure 2, their method divides C3 and 


V — C3, and does not find the FLG-community C4. 


mum flow algorithm is based on the idea that sparse edges 
between the inside and the outside of an FLG-community 
become a bottleneck of the flow from an inside vertex to 
an outside vertex when the capacity of every edge is one. 
A subset of saturated edges, the edges through which the 
amount of water equal to its capacity is running, gives a 
cut that divides the set of all vertices into two sets, a set 
containing the source and a set containing the sink. Ford 
and Fulkerson’s “max flow-min cut” theorem [9, 10] guar- 
antees that this cut contains the minimum number of edges 
among all cuts dividing the source and the sink. Flake et al. 
claimed that, in order to be identified by the maximum flow 
algorithm, an FLG-community C must satisfy the condition 
that both s* and tř are larger than the number of edges in 
the cut (C, V — C), where s* is the number of edges between 
the source s and vertices in C, and t® is the number of edges 
between the sink t and vertices in V — C. 

There are two problems for the method proposed by Flake 
et al. First, the boundary of an FLG-community is ambigu- 
ous, that is, many slightly different subsets could be FLG- 
communities for one densely-connected part of a graph. Sec- 
ond, a vertex set found by a maximum flow algorithm might 
not be an FLG-community even if an FLG-community C 
satisfying the above condition exists. 

To overcome the first problem, this paper introduces a 
stricter community definition than the FLG-community def- 
inition. In the FLG-community definition, only inside ver- 
tices of a community are conditioned. In our community 
definition, it must be also satisfied that each outside vertex 
has at least as many edges connecting to outside vertices 
as it does to inside vertices. (We call an FLG-community 
satisfying this condition an [kK N-weak-community.) In addi- 
tion to this condition, inside vertices must satisfy a stricter 
condition that each inside vertex has more edges connect- 
ing to inside vertices than it does to outside vertices. (We 
call an IKN-weak-community satisfying this condition an 
IKN-community.) The definition of IKN-community reduces 
boundary ambiguity of FLG-community to some extent be- 
cause two distinct IkKN-communities differ in at least two 
vertices. 

A clarification of what can be found by an s-t maximum 
flow algorithm approaches the second problem. We prove 
that what can be found by an s-t maximum flow algorithm 
is not an FLG-community but a s-t quasi-[K N-community in 
which all vertices but the two vertices s and t satisfy the con- 
ditions of IKN-community. In terms of FLG-communities, 
this means that a maximum flow algorithm always finds a 
vertex set whose members but the source s satisfy the con- 
dition of (strict?) FLG-community, which has been already 
proved by Flake, Tarjan and Tsioutsiouliklis [3]. So, an 
s-t maximum flow algorithm might seem to approximately 
find an FLG-community. However, the fact that a source 
might not satisfy the condition should not be neglected be- 
cause a source is the most important vertex which must be 
contained in the community. On the other hand, the fact 
that all members but the source s (and the sink t) in a 
found set satisfies the conditions of FLG-community (IKN- 
community) ensures that only s (and t) should be checked 
to satisfy the conditions to know whether it is an FLG- 
community (IKN-community) or not. 


An efficient algorithm to find not an s-t quasi-IK N-community 


See the definition in Sec. 2.1 


but an s-t IKN-community (an IKN-community that in- 
cludes s and excludes t) has not been known yet. As a 
difference between the problem of finding an s-t IKN-weak- 
community and the problem of finding an s-t quasi-IKN- 
community, which can be solved efficiently by an s-t max- 
imum flow algorithm, we show the fact that each coordi- 
nate of every extreme point solution is an integer for the 
LP-relaxation of the integer programming in a formaliza- 
tion of the latter problem, but might not be an integer 
for that in a formalization of the former problem. This 
seems to support the hardness of the problem of finding 
an s-t IkKN-community. Actually, very recently, the prob- 
lem of deciding whether an s-t IKN-community (IK N-weak- 
community) exists or not for given s and t has been proved 
to be NP-complete even if weights are restricted to be 1 
[8]. The problem of deciding whether an IKN-community 
(IKN-weak-community) exists or not in a given graph has 
been also proved to be NP-complete [3], which implies NP- 
completeness of the above s-t IKN-community (IK N-weak- 
community) problem without restriction on weights. 

Partitioning Web pages into groups having similar prop- 
erties is useful for information retrieval, and Web commu- 
nities can be used for this purpose. We propose Web page 
partitioning by the equivalence relation defined using the 
class of IKN-communities, where two pages are equivalent 
if and only if the sets of IKN-communities including each 
page coincide. This equivalence relation can be also de- 
fined by using FLG-communities, but a partition obtained 
by the relation may not be useful for the boundary ambigu- 
ity problem described above. We also propose hierarchical 
partitioning by repeatedly applying this partitioning to the 
contracted graph in which all original vertices in the same 
partition are contracted into one vertex. 

In order to partition a Web graph by the equivalence re- 
lation defined using IKN-communities, the existence of s-t 
IKN-communities should be checked for arbitrary vertices s 
and t, which is an NP-complete problem as described above. 
In this paper, we propose a coarser partitioning by the equiv- 
alence relation defined using a subclass of IKN-communities 
which are efficiently extracted by an s-t maximum flow al- 
gorithm. As mentioned above, sets found by an s-t maxi- 
mum flow algorithm are s-t quasi-[K N-communities, so only 
a further check if s and t satisfy the requirements of be- 
ing an IKN-community is needed. Our method find IKN- 
communities among all 2(n — 1) vertex subsets partitioned 
by one of n—1 cuts represented by a Gomory-Hu tree [10]. A 
Gomory-Hu tree of a connected graph G(V, E) is a tree with 
a set of vertices V in which every cut (C, V —C) obtained by 
removing one edge (s,t) is also a s-t minimum cut in G. It 
is known that a Gomory-Hu tree can be created efficiently 
by executing maximum flow algorithms n — 1 times. 

According to our preliminary experiments, partitioning by 
our method divided the pages retrieved by keyword search 
into several different categories to some extent. 


2. WEB COMMUNITIES 


2.1 Definition 


Let an undirected graph G(V, E) be a graph in which each 
vertex represents a Web page and each edge represents a link 
between two distinct pages. Assume that a weight wuv(= 
Wu) > 0 is given to each pair of vertices u and v, and 
Wuv = 0 if there is no edge between u and v. A weight 


Wuv for a pair of vertices u and v can be any value when an 
edge exists between them, but we assume that it is 1 unless 
explicitly stated otherwise. 

Flake, Lawrence and Giles [2] defined a web community, 
which we call an FLG-community here, in terms of undi- 
rected graphs as follows. 


DEFINITION 1 (FLAKE, LAWRENCE AND GILES [2]). An 
FLG-community is a vertex subset C C V that satisfies the 
following Condition 1. 


CONDITION 1. 5 Wuv = XE Ww for allu EC. 
vec 


vEeV—-C 


1: 
, C5. 


Figure of FLG-Communities 


Ci, C2, ... 


Example 


This definition has the problem of boundary ambiguity. 
For example, in the graph of Figure 1, C1,C2,C3,C4 and 
Cs are all FLG-communities, though they are essentially the 
same densely connected part of the graph. This raises the 
problem of boundary ambiguity when we want to extract 
one community for one densely connected part. Therefore, 
we propose a stricter definition as follows. 


DEFINITION 2. An IK N-weak-community is a verter 
subset C C V that satisfies Condition 1 and the following 
Condition 2. 


CONDITION 2. 5 Wuv < 5 Ww for allu E€ V — C. 
vEC vEV-C 


DEFINITION 3. An IKN-community is a vertex subset 
C C V that satisfies the following Condition 1' and Condi- 
tion 2. 


CONDITION 1’. 5o Wuv > 5 Wuv for allu € C. 
vec vEeV—-C 


Flake, Tarjan and Tsioutsiouliklis [3] consider a community 
definition using Condition 1’ only. Here, we call such com- 
munities strict FLG-communities. 


Note that 
{C : C is an IKN-community} 
C {C:C is an IKN-weak-community} 
C {C:C is an FLG-community}. 


The whole set V and the empty set @ are trivial IKN-commu- 
nities. Vertex subsets C3, C4 and Cs are IK N-weak-commun- 
ities among 5 FLG-communities C1, Co2,...,Cs in Figure 1, 
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and only C3 is an IKN-community. Note that Ci, Co and 
C3 are strict FLG-communities. The following proposition 
ensures that if C is an IKN-community, C U {v} for any 
v ¢ C and C — {v} for any v € C is not an IKN-community, 
which means that boundary ambiguity is reduced. 


PROPOSITION 1. For any two distinct IK N-communities 
Cı and Co, their symmetric difference contains at least two 
vertices. 


PROOF. Let u ¢ Cı and C2 = Ci U{u}. Since C2 satisfies 


Condition 1', X ec, Wuv = Lvec, Wuv > Drevo, Wuv = 
ever, Wuv, Which contradicts the assumption that Ci 
satisfies Condition 2. O 


2.2 How to Find Web Communities 


Figure 2: C2 and C4 are communities that are not 
found by an s-t maximum flow algorithm. 


Flake, Lawrence and Giles proposed a method that uses 
a maximum flow algorithm to find an FLG-community. As 
shown by them, it is a clear fact that an s-t maximum flow 
algorithm fails to identify an FLG-community C which in- 
cludes a vertex s and excludes a vertex t, when the number 
of edges between C and V — C is larger than the number 
of edges s* between s and C — {s}, or the number of edges 
t” between t and V — C — {t}. Then, when the number of 
edges between C and V — C is smaller than both s* and t*, 
can C identified by an s-t maximum flow algorithm’? The 
answer is no. In the case of the left graph in Figure 2, Ci 
and C% satisfy the condition but the FLG-community found 
by an s-t maximum flow algorithm is Cı only. In the case 
of the right graph in Figure 2, C4 satisfies the condition but 
the found set is C3, which is not even an FLG-community. 

These examples indicate the following two facts for com- 
munities C that include s, exclude t, and have the number 
of edges between C and V — C which is smaller than both 
s* and t”. 


e Not all such communities C can be found by an s-t 
maximum flow algorithm. 


e Non-FLG-communities can be found by an s-t maxi- 
mum flow algorithm even if such communities C exist. 


The second fact make us reluctant to use the algorithm for 
finding FLG-communities. 

In the followings, we make clear what can be found by a 
maximum flow algorithm. 

Let G” = (V, E’) be a directed graph generated from G by 
replacing each undirected edge (u,v) to two directed edges 
(u,v) and (v, u). Given two vertices s and t, let fs, denote a 
maximum flow [9] from s to t when the capacity of each edge 
(u,v) is Wuv. We define S(fs,+) as the subset of vertices that 
are reachable from s in the residual graph [9] of G” for fs,t. 
Note that a residual graph of G” = (V, E’) for f is defined as 


the graph that is composed of all vertices and the edges (u, v) 
with positive residual capacity ruv = Wuv — f (u, v)+ f(v, u). 

We call an IKN-community that includes s and excludes t 
an s-t IKN-community. If a subset of vertices that includes s 
and excludes t satisfies Condition 1’ and Condition 2 except 
s and t, then we call the subset an s-t quasi-IKN-community. 
Then, the following proposition holds. 

PROPOSITION 2. S(fs +) is an s-t quasi-IKN-community.? 

PROOF. Let u € S(fs t) and u # s. Since there is no 
edge from u to any vertices in V — S(fs,) in the residual 
graph for fs, t, fs t(u, v) = Wuv and fs (v, u) = 0 hold for all 
v E€ V — S( fs). Thus, 


Sees. T 
vEV—S(fs,t) vEV-—S(fs,t) 
< 5 fs,e(u, v) = 5 fs (v, u) 
vEV vEV 
= 5o fs,e(v, u) 
vES(fs,t) 
Do e 
vES(fs,t) vES(fs,+) 


holds. If equality D uev-s(fe1) Wuv = Dueso) Wuv holds, 
fs el(u, v) = 0 and fs e(v,u) = Wvu hold for all v € S(fs,+), 
which means that there is no edge from any vertices in 
S(fs,) to u in the residual graph. This contradict the fact 
that u belongs to S(fs,4). Thus, all vertices in S(fs,4) — {s} 
satisfies Condition 1’. 

Similarly, it can be proved that all vertices in V—S(fs,4) — 
{t} satisfies Condition 2. O 


REMARK 1. Foru=s,t, Jev fs tls, v) = Daey fs (v, s)+ 


lfst| and do ey fst(t,v) + I fst] =X uev fs t(v,t) hold in- 
stead of er fs,t(u,v) = Drev fs (v, u), where |fs t| is 
the value of flow fst. Therefore, S(fs +) might not be an s-t 
IKN-community. 


REMARK 2. Cut (S(fs t), V — S(fs,t)) is one of the s-t 
minimum cuts [9], but sets C and V — C for an s-t min- 
imum cut (C, V — C) might not be s-t and t-s quasi-IKN- 
communities. For example, cut C4, V — Ca in Figure 1 is 
an s-t minimum cut but neither of them is an s-t or t-s 
quasi-IKN-community because u for C4 and v for V — C4 
do not satisfy Condition 1°. But for all s-t minimum cut 
(C,V — C), both C and V — C are s-t quasi-IKN-weak- 
communities, where an s-t quasi-IKN-weak-community is a 
vertex subset that includes s, excludes t and satisfies Condi- 
tion 1 and Condition 2. 


Note that Proposition 2 indicates that we only have to 
check that s satisfies Condition 1’ and t satisfies Condition 2 
in order to know whether S(fs,t+) is an IKN-community or 
not. 

As shown above, an s-t maximum flow algorithm only 
guarantees that a found set is an s-t quasi-IKN-community, 
and an efficient algorithm that finds s-t IKN-community has 
not been known yet. Note that s-t quasi-IKN-communities 
always exist but s-t IKN-communities might not exist. 


3A part of this proposition, namely, a claim that all vertices 
in S(fs,4.)—{s} satisfy Condition 1’, has been already proved 
by Flake, Tarjan and Tsioutsiouliklis [3]. 
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In the followings, we show one evidence that the prob- 
lem of finding an s-t IKN-community looks computationally 
hard. The problem can be formalized as the following in- 
teger program that is obtained by adding conditions to an 
integer program [10] in a formalization of the minimum cut 
problem, of which LP-relaxation is the LP-dual program of 
a linear program in a formalization of the maximum flow 
problem. 


PROBLEM 1. Minimize 5 Wuv duv 


(u,v)EE' 
subject to 
duv — Pu + Pv > 0 for (u,v) E E' (1) 
Ps — pt = 1 (2) 
— 5 Wuvduy > —= 5 Wuv foru € V (3) 
v:(u,v)EE' v:(u,v)EE' 
S> ye Wuvduv > -2 5 Wuv for v EV (4) 
u:(u,v)EE' u:(u,v)EE' 
duv € {0,1} for (u,v) € E' (5) 


Pu € {0,1} foru e V 


The following proposition holds. 


PROPOSITION 3. Problem 1 has a feasible solution. = An 
s-t IKN-weak-community exists. 


PROOF. (=>) Let {p}, div : u € V,(u,v) € E'} be an 
optimal solution. let C = {u : py = 1}. Note that s € C 
and t ¢ C because pš — pj > 1. Then, the optimality leads 
that 


g = 1 ifp} = 1 and p = 0 
ue) 0 otherwise. 


For u € C, 


5 Wuv = 5 Wuvduy < 5 a Wuv 


veEV—-C vEV vEV 


holds by Inequality (3). Thus, C satisfies Condition 1. For 
vEeEV-C, 


5 Wuv = 5 Wuvduy < 5 5 Wuv 


ucc uer ueV 


holds by Inequality (4). Thus, C satisfies Condition 2. 
(<) For an s-t IKN-community C, set pu for u € V and du» 
for (u,v) € E’ as follows. 


= 1 ifwedc 
Pu = 0 otherwise 
d _ 1 ifp,=1andp, =0 
ae Oe 0 otherwise. 


Then, pu for u € V and dw for (u,v) € E’ are a feasible 
solution of Problem 1. O 


The following is the LP-relaxation of Problem 1. 


PROBLEM 2. Minimize 5 Wuvduv 
(u,v)EE’ 


subject to 


duv — Pu + Po > 0 for (u,v) € E’ (7) 

Ps-p >21 (8) 

— 5 Wuvduv > ad 5 Wuv foru €E V (9) 
v:(u,v)EE' v:(u,v)EE' 

S 5 Wuvduy > =. 5 Wuv forv E€ V (10) 
u:(u,v)EE' u:(u,v)EE' 

duv > 0 for (u,v) € E’ (11) 


Pu > 0 forue V 


For the corresponding LP-relaxation of the minimum cut 
problem, the conditions represented by Inequalities (9) and 
(10) are not needed, and it has been proved that each coor- 
dinate of every extreme point solution is 0 or 1 [10], which is 
not true for the Problem 2. Actually, the optimal solutions 
of Problem 1 do not coincide with those of Problem 2 as 
shown in Figure 3. 

As hardness results on finding an s-t IKN-community, 
stronger results have been proved very recently. In [8], 
NP-completeness was proved for the problem of deciding 
whether an s-t IKN-community (IKN-weak-community) ex- 
ists or not for given s and t even if weights are restricted 
to be 1. Flake, Tarjan and Tsioutsiouliklis [3] also proved 
NP-completeness for the problem of deciding whether an 
IKN-community (IKN-weak-community) exists or not in a 
given graph, which implies NP-completeness of the above s- 
t IKN-community (IKN-weak-community) problem without 
restriction on weights. 

The following proposition, which claims that an IKN- 
community can be constructed from a set satisfying only 
Condition 1’ or Condition 2, may help the task of Web com- 
munity discovery. The constructed set might be a whole set 
or an empty set. 


PROPOSITION 4. (1) For any subset C C V that satisfies 
Condition 1', the minimum IKN-community including C can 
be constructed. 

(2) For any subset C C V that satisfies Condition 2, the 
maximum IKN-community included in C can be constructed. 


PROOF. (1) Let Bout(C) denote the set of vertices in 
V — C that does not satisfy Condition 2. Let Co = C, 
and define Ci+ı as Ci U Bout(Ci). Then, Ci+1 also satisfies 
Condition 1’ when C; satisfies that condition. Since Co sat- 
isfies the condition, all C; satisfy it. The sequence Co, Ci, ... 
is monotonically increasing and |V| < oo, so there exists no 
such that Cn = Cy, for all n > no. Then, Cno is an IKN- 
community because Bout(Cno) = Ø. Since the minimum 
community including C; trivially contain Bout(Ci), Cno is 
the minimum IKN-community including C. 

(2) This can be proved similarly. O 


If both of two sets C1 and C2 satisfy Condition 1’, then 
C U C2 also satisfies that condition, thus we can also con- 
struct the minimum IKN-community containing both of them. 
Similarly, we can find the maximum IKN-community con- 
tained in both of two sets that satisfy Condition 2. There- 
fore, different IKN-communities can be found by using a set 
of IKN-communities. (See Figure 4.) 
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Figure 4: C3 is the minimum IKN-community con- 
taining C1 UC2. Ci is the maximum IKN-community 
contained in C3 N C4. 


3. GRAPH PARTITIONING BY 
COMMUNITY TOPOLOGY 


3.1 Definition 


Let C be a class of subsets of V in a graph G = (V, E). For 
each vertex u, define the class U (u) of neighborhoods of u as 
the set of subsets in C that contain u, that is, U(u) = {C : 


C € C,u E€ C}. Consider the following equivalence relation 
R: 


uRv & U(u) =U(v). 


We call partitioning by this equivalence relation partitioning 
by C. In this paper, we consider partitioning by the class of 
IKN-communities. 

By regarding each equivalence class as one vertex, we can 
generate a contracted graph. We can obtain a higher level 
partitioning by the class of IKN-communities in the con- 
tracted graph. This procedure can be repeated until every 
equivalence class becomes composed of only one vertex. 

In this paper, we propose a hierarchical partitioning by 
the class of IK N-communities through repeating partitioning 
and contraction. 

See Figure 5 for an example of partitioning by the class of 
IKN-communities and a contracted graph for a partitioning. 


oe eae 


Figure 5: Left: Partitioning by the class of IKN- 
communities (Every edge weight is one.), Right: 
Contracted graph of the left graph. 


Figure 3: Right: Optimal solution of Problem 2 (value of objective function:1), Left: Optimal solution of 
Problem 1 (value of objective function:2). Bold numbers represent pu, and italic numbers represent duv. Note 
that du, for each edge (u,v) directed to left, which is 0, is omitted. 


3.2 Efficient Algorithm for Partitioning 


Considering NP-completeness for the problem of deciding 
whether an s-t IKN-community exists or not for arbitrary 
two vertices s and t, partitioning by the whole class of IKN- 
communities is not practical. Here, we propose a method 
that finds a subclass of IKN-communities efficiently and thus 
generates a coarse partitioning by the subclass. 

The maximum flow algorithms are efficient but IK N-com- 
munities might not be found by them as commented in Re- 
mark 1. However, the sets found by them are always s-t 
quasi-IKN-communities C, so we only have to check that 
the source s satisfies Condition 1’ and the sink ¢ satisfies 
Condition 2 to decide whether C is an IKN-community or 
not. Here, we consider the problem of finding the subclass 
of IKN-communities that can be found by an s-t maximum 
flow algorithm for all pairs of vertices s and t. 

For an undirected graph G(V, E) with n vertices, existence 
of a set D of n — 1 cuts that satisfies the following property 
is known [10]. 


Property 1 For arbitrary two vertices s and t, there exists 
an s-t minimum cut in D. 


Furthermore, there exists a set D that has Property 1 and is 
represented by a tree T called a Gomory-Hu tree [10] which 
is composed of vertices in V and n — 1 edges representing 
n—1cuts in D. A Gomory-Hu tree can be efficiently created 
by repeatedly using a maximum flow algorithm n — 1 times. 
We propose a method that finds IKN-communities among 
the sets partitioned by the n — 1 minimum cuts represented 
by a Gomory-Hu tree. 

Note that there might be minimum cuts which are not 
represented by one Gomory-Hu tree, and there might exist 
other Gomory-Hu trees representing different set of mini- 
mum cuts for the same graph. Thus, our proposed method 
does not check whether S(fs,4) is an IKN-community for all 
s,t E€ V. In order to raise the possibility of finding IKN- 
communities, we adopt the following heuristics. 

Define T(fs,4) as the set of vertices from which t is reach- 
able in the residual graph for a flow fs. Then, T(fs,z) can 
be also proved to be a t-s quasi-I[KN-community. Note that 
T(fs,t) can coincide with V — S(fs,4) but is different gener- 
ally. In the process of constructing a Gomory-Hu tree, we 
check both sets S(fs) and T(fs,+) after every finding maxi- 
mum flow fs,4, and adopt the one that is an IKN-community 
as a minimum cut, if either S(fs) or T(fs,) is an IKN- 
community. 

For example, the leftmost graph in Figure 6 has Gomory- 
Hu trees expressed by the bottom graphs in the figure that 
represent different sets of minimum cuts, which are also 
drawn by broken lines in the corresponding top graphs of 
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Figure 6: Example of Gomory-Hu trees representing 
different set of minimum cuts. 


the figure. Note that the middle Gomory-Hu tree does not 
have the minimum cut {0,1}, {2,3,4,5}, but the rightmost 
one does, where set {2,3,4,5} is an IKN-community. Note 
that, for this graph, the rightmost Gomory-Hu tree is con- 
structed by the above heuristics. 


4. PRELIMINARY EXPERIMENTS 
Te ë f o T 


Level 
# Vertex 
##Edge 


#:(Isolated vertex) 
#-(Connected component) 
#IKN-community 
#£(Equivalence class) 


Table 1: The number of vertices, edges, isolated 
vertices, and connected components (that are not 
isolated vertices) in two graphs created by the pro- 
cedure Subgraph, and the number of found IKN- 
communities (that are not connected components 
themselves), and equivalence classes (that are not 
isolated vertices) partitioned by the class of the 
found IKN-communities. 


We conducted experiments on graph partitioning by a 
subclass of IKN-communities, using subgraphs of the WWW 
that are composed of the pages related to given keywords. 
For given keywords, we construct a subgraph by using the 
Subgraph procedure proposed by Kleinberg [6], which re- 
trieves t pages by using a search engine and adds all pages 
that are linked from or linking to at least one of them, 
though the number of pages linking to is restricted within 


(#pages:1590) 
0 tip.psychology.org (Theory Into Practice (TIP)) 


2 www.usask.ca/education/coursework/802papers/mergel/brenda.htm (Learning Theories of Instructional Design) 


4 www.funderstanding.com/about_learning.cfm (Funderstanding - About Learning) 


other 928 pages(45 pages) 
1 tip.psychology.org/theories.html ( TIP: The Theories ) 
other 10 pages(0 pages) 


3 www.ozline.com/learning/theory.html (ozline - Working the Web for 


other 58 pages(0 pages) 


Education) 


7 www.exploratorium.edu/IFI/resources/research/constructivistlearning.html (Constructivist Learning Theory) 


other 2 pages(0 pages) 


8 www.educationau.edu.au/archives/cp/04.htm (Learning Theories) 


other 7 pages(0 pages) 
other 53 partitions 


(#pages:656) 


10 www.learningtheory.org (COLT: Computational Learning Theory) 


59 www.crm.es/Activities/Act2003-04/LearningTheory/LearningTheoryhome.htm (Mathematical Foundation of Learning Theory) 


61 theory.lcs.mit.edu/COLT-98 
other 462 pages(9 pages) 


19 plato.stanford.edu/entries/learning-formal (Formal Learning Theory) 


other 2 pages(0 pages) 


43 www.cis.udel.edu/ case/colt.html ( John Case’s COLT Page) 


other 19 pages(0 pages) 


51 liinwww.ira.uka.de/bibliography/Ai/colt.html (Bibliography on Computational and Algorithmic Learv...) 


other 8 pages(0 pages) 


52 www.esat.kuleuven.ac.be/sista/natoasi/Itp2002.html (NATO-ASI LTP2002) 


other 15 pages(0 pages) 
other 15 partitions 


(#pages:179) 


35 w4.evectors.it /itEntDirectory/topic?topic=learningtheory (w4) 
106 www.unimelb.edu.au/HB/subjects/468-110.html (468-110 Advanced Learning Theory) 


other 125 pages(0 pages) 
other 6 partitions 


(#pages:1) 


— 40 psych.fullerton.edu/jmearns/book3.htm (Applications of a Social Learning Theory of Personv...) 


(#pages:6) 


42 www.acm.org/sigchi/chi96 /proceedings/papers/Soloway/estxt.htm (Learning Theory in Practice: Case Studies of Learv...) 
L other 4 pages(0 pages) 
o 


ther 1 partitions 
— other 265 partitions 


Figure 7: Result for keywords “learning theory.” 


d pages. In our experiment, we used search engine Google 
(www.google.co.jp), and set t and d to 200 and 50, respec- 
tively. Note that we removed all intrinsic links [6], namely, 
links to pages of the same domain, as Kleinberg did. By the 
procedure Subgraph, graphs with the number of vertices, 
edges, isolated vertices and connected components shown in 
Table 1 are obtained for two keywords “learning theory” 
and “jaguar”. 

We hierarchically partitioned the obtained graphs by us- 
ing our proposed graph partitioning algorithm two times. 
Namely, in the level-1 partitioning, we used a contracted 
graph that is created by regarding each equivalence class 
obtained in the level-0 partition as one vertex. As shown 
in Table 1, 117 and 147 IKN-communities are found in the 
level-0 partitioning, and 12 and 20 IKN-communities are 
found in the level-1 partitioning. 

The results of hierarchical partitioning for the two graphs 
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are shown in Figure 7 and Figure 8. We ranked each par- 
tition by the highest Google rank of the pages included in 
the partition. In terms of this ranking, the figures show 
the top 5 level-1 partitions and the top 5 level-0 partitions 
for each of those level-1 partitions. The pages accompanied 
with their URLs and titles are ones of which Google rank is 
within 200 and is within the top 3 among the members of 
each level-0 equivalence class. The number of pages whose 
URLs and titles are not described is shown after the word 
“other”, and the number of those pages whose Google rank 
is within 200 is also shown in the following parentheses. 
Figure 7 is the result for keywords “learning theory”. 
There are mainly two learning theories, one is in the area 
of education and psychology, the other is computational 
learning theory. The second level-1 partition consists of 
the pages related to computational learning theory, and the 
other level-1 partitions consist of the pages related to edu- 


(#pages:613) 
0 www.jaguar.com/global/default.htm (Jaguar Cars) 
120 jaguar.anort.com 


152 chulkov.com/jaguar/jaguar.htm (&#160;JAGUAR X-Type Best in the world the automobiv...) 


other 66 pages(0 pages) 

2 www.jaguar-racing.com 

other 34 pages(0 pages) 

3 www.jaguar-racing.com/uk/flash 
other 6 pages(0 pages) 

6 www.jaguarcars.com (Jaguar Cars) 
other 44 pages(0 pages) 

7 www.jaguarcars.com/jp 


8 www.jag-lovers.org (Jag-lovers - here to provide everything for the Jav...) 


9 www.jag-lovers.org/brochures (Jag-lovers unique original Brochures and Adverts cv...) 


other 207 pages(30 pages) 
other 33 partitions 


(#pages:1) 


— 1 www.jaguar.com/de/de/home.htm (home) 


(#pages:984) 
4 www.apple.com/macosx (Apple - Mac OS X) 


5 www.apple.com/macosx/overview (Apple - Mac OS X - Overview) 
44 www.macattorney.com/tutorial.html (OS X 10.2 Jaguar Troubleshooting) 


other 522 pages(7 pages) 
11 www.schrodinger.com (Schr&ouml;dinger) 
other 12 pages(0 pages) 


12 www.schrodinger.com/Products/jaguar.html (Schr&ouml;dinger: Jaguar Program) 


other 10 pages(0 pages) 


16 www.jaguarmodels.com (Jaguar Models - Main Page (resin model kits)) 


other 4 pages(0 pages) 


35 www.oneworldjourneys.com/expeditions/jaguar (One World Journeys — Jaguar: Lord of the Mayan Junv...) 


other 5 pages(0 pages) 
other 25 partitions 


(#pages:7) 
19 dspace.dial.pipex.com/agarman/jaguar.htm (Jaguar) 
other 1 pages(0 pages) 
37 www.bluelion.org/jaguar.htm (Panthera onca) 
other 2 pages(0 pages) 
other 1 partitions 


(#pages:1) 
20 dspace.dial.pipex.com/agarman/bco/jaguar.htm (Jaguar) 


— other 443 partitions 


Figure 8: Result for keyword “jaguar.” 


cational learning theory, though those partitions are small 
except the first one. The number preceding each URL shows 
its Google rank, and you can see that the top 10 pages 
of Google search result is biased toward educational learn- 
ing theory. Our result indicates that partitioning by IKN- 
communities can be used to produce balanced search results. 

Figure 8 is the result for keyword “jaguar”. The top 2 
level-1 partitions are related to the automobile, the third 
one is mainly composed of pages related to the computer and 
the fourth and fifth ones are related to the animal. In the 
computer partition, the first level-0 partition, whose mem- 
bers are in majority, is related to the Mac OS, the second 
and third level-0 partitions consist of the pages of a software 
company producing a package called Jaguar, and the fourth 
and fifth level-O small partitions are composed of pages not 
related to the computer. 
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5. CONCLUDING REMARKS 


The method finding IKN-communities on the way to con- 
structing a Gomory-Hu tree is computationally efficient. It 
runs in O(mn? logn) time* by using a maximum flow al- 
gorithm developed by Sleator and Tarjan [9], where m is 
the number of edges and n is the number of vertices. How- 
ever, there might be many IKN-communities that cannot 
be found by this method. As a result, partitions obtained 
by the method are possibly too coarse. Therefore, the algo- 
rithm that can efficiently find more IKN-communities should 
be developed. One candidate of such algorithms is a method 
that solves Problem 1 using some optimization method. 

The efficient method based on edge betweenness devel- 


“This computational time can be reduced by using a faster 
algorithm [5], though its time bound is more complicated. 


oped by Girvan and Newman [4] may be used to find IKN- 
communities. They conducted experiments for computer- 
generated graphs, in which 128 vertices are partitioned into 
4 groups with 32 vertices, and each vertex is connected to 
16 other vertices by randomly-generated edges, k of them 
are vertices in other groups and 16 — k of them in the same 
group. According to the reported result, their algorithm 
extracted 4 groups completely when k < 6. Note that 4 
groups in their graphs are IKN-communities when k < 7. 
Their method does not guarantee that any vertex satisfies 
the conditions of IKN-communities for any vertex subset ob- 
tained by the method, and the method can not find overlap- 
ping IKN-communities. But their method runs in O(mn?) 
time and has possibility that larger IK N-communities can be 
found, so we think that further study on using their method 
to find IkKN-communities should be done. 
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